Welcome![Sign In][Sign Up]
Location:
Search - java tag

Search list

[Internet-Network用Java编写HTML文件分析程序

Description:

Java编写HTML文件分析程序

 一、概述

    

    Web服务器的核心是对Html文件中的各标记(Tag)作出正确的分析,一种编程语言的解释程序也是对源文件中的保留字进行分析再做解释的。实际应用中,我们也经常会碰到需要对某一特定类型文件进行要害字分析的情况,比如,需要将某个HTML文件下载并同时下载与之相关的.gif.class等文件,此时就要求对HTML文件中的标记进行分离,找出所需的文件名及目录。在Java出现以前,类似工作需要对文件中的每个字符进行分析,从中找出所需部分,不仅编程量大,且易出错。笔者在近期的项目中利用Java的输入流类StreamTokenizer进行HTML文件的分析,效果较好。在此,我们要实现从已知的Web页面下载HTML文件,对其进行分析后,下载该页面中包含的HTML文件(假如在Frame中)、图像文件和ClassJava Applet)文件。

    

    二、StreamTokenizer

    

    StreamTokenizer即令牌化输入流的作用是将一个输入流中变成令牌流。令牌流中的令牌实体有三类:单词(即多字符令牌)、单字符令牌和空白(包括JavaC/C++中的说明语句)。

    

    StreamTokenizer类的构造器为: StreamTokenizer(InputStream in)

    

    该类有一些公有实例变量:ttypesvalnval ,分别表示令牌类型、当前字符串值和当前数字值。当我们需要取得令牌(即HTML中的标记)之间的字符时,应访问变量sval。而读向下一个令牌的方法是调用nextToken()。方法nextToken()的返回值是int型,共有四种可能的返回:

    

    StreamTokenizer.TT_NUMBER: 表示读到的令牌是数字,数字的值是double型,可以从实例变量nval中读取。

    

    StreamTokenizer.TT_Word: 表示读到的令牌是非数字的单词(其他字符也在其中),单词可以从实例变量sval中读取。

    

    StreamTokenizer.TT_EOL: 表示读到的令牌是行结束符。

    

    假如已读到流的尽头,则nextToken()返回TT_EOF

    

    开始调用nextToken()之前,要设置输入流的语法表,以便使分析器辨识不同的字符。WhitespaceChars(int low, int hi)方法定义没有意义的字符的范围。WordChars(int low, int hi)方法定义构造单词的字符范围。

    

    三、程序实现

    

    1HtmlTokenizer类的实现

    

    对某个令牌流进行分析之前,首先应对该令牌流的语法表进行设置,在本例中,即是让程序分出哪个单词是HTML的标记。下面给出针对我们需要的HTML标记的令牌流类定义,它是StreamTokenizer的子类:

    

    

    import java.io.*;

    import java.lang.String;

    class HtmlTokenizer extends

    StreamTokenizer {

    //定义各标记,这里的标记仅是本例中必须的,

    可根据需要自行扩充

     static int HTML_TEXT=-1;

     static int HTML_UNKNOWN=-2;

     static int HTML_EOF=-3;

     static int HTML_IMAGE=-4;

     static int HTML_FRAME=-5;

     static int HTML_BACKGROUND=-6;

     static int HTML_APPLET=-7;

    

    boolean outsideTag=true; //判定是否在标记之中

    

     //构造器,定义该令牌流的语法表。

     public HtmlTokenizer(BufferedReader r) {

    super(r);

    this.resetSyntax(); //重置语法表

    this.wordChars(0,255); //令牌范围为全部字符

    this.ordinaryChar('< '); //HTML标记两边的分割符

    this.ordinaryChar('>');

     } //end of constrUCtor

    

     public int nextHtml(){

    int token; //令牌

    try{

    switch(token=this.nextToken()){

    case StreamTokenizer.TT_EOF:

    //假如已读到流的尽头,则返回TT_EOF

    return HTML_EOF;

    case '< ': //进入标记字段

    outsideTag=false;

    return nextHtml();

    case '>': //出标记字段

    outsideTag=true;

    return nextHtml();

    case StreamTokenizer.TT_WORD:

    //若当前令牌为单词,判定是哪个标记

    if (allWhite(sval))

     return nextHtml(); //过滤其中空格

    else if(sval.toUpperCase().indexOf("FRAME")

    !=-1 && !outsideTag) //标记FRAME

     return HTML_FRAME;

    else if(sval.toUpperCase().indexOf("IMG")

    !=-1 && !outsideTag) //标记IMG

     return HTML_IMAGE;

    else if(sval.toUpperCase().indexOf("BACKGROUND")

    !=-1 && !outsideTag) //标记BACKGROUND

     return HTML_BACKGROUND;

    else if(sval.toUpperCase().indexOf("APPLET")

    !=-1 && !outsideTag) //标记APPLET

     return HTML_APPLET;

    default:

    System.out.println ("Unknown tag: "+token);

    return HTML_UNKNOWN;

     } //end of case

    }catch(IOException e){

    System.out.println("Error:"+e.getMessage());}

    return HTML_UNKNOWN;

     } //end of nextHtml

    

    protected boolean allWhite(String s){//过滤所有空格

    //实现略

     }// end of allWhite

    

    } //end of class

    

    以上方法在近期项目中测试通过,操作系统为Windows NT4,编程工具使用Inprise Jbuilder3


Platform: | Size: 1066 | Author: tiberxu | Hits:

[Other resourceJSP.Tag.Defined.Reference

Description: 掌握自定义JSP标签,是一本很好的java web开发的参考书籍。-master custom JSP label is a good java web development reference books.
Platform: | Size: 170621 | Author: 小科 | Hits:

[Other resourceOreilly.Java.Server.Pages.3rd.Edition.eBook-LiB.ra

Description: JavaServer Pages, Third Edition is completely revised and updated to cover the substantial changes in the 2.0 version of the JSP specification. It also includes detailed coverage of the major revisions to the JSP Standard Tag Library (JSTL) specification. Combining plenty of practical advice with detailed coverage of JSP syntax and features and clear, useful examples, JavaServer Pages, Third Edition demonstrates how to embed server-side Java into Web pages, while also covering important topics such as JavaBeans, Enterprise JavaBeans (EJB), and JDBC database access.
Platform: | Size: 2199498 | Author: 杨阳 | Hits:

[Web ServerAppenture JSP Database Custom Tag Library

Description: Appenture JSP Database Custom Tag Library 是一个强大的JAVA服务器页码库,它的特性对设计者和程序员都适用。它允许非程序员在很容易的在一个数据库里的数据里工作而不需要编程技能。它也允许程序员开发和改编数据库而不需要改变数据显示编码.zip-Appenture JSP Database Custom Tag Library is a powerful Java Server Page Library, which features the designers and programmers are applicable. It allows non-programmers can easily in a database data without the need to work programming skills. It also allows programmers to the development and adaptation of the database data without the need to change code. Zip
Platform: | Size: 60047 | Author: 邓耀 | Hits:

[Web ServerAppenture JSP Database Custom Tag Library

Description: Appenture JSP Database Custom Tag Library 是一个强大的JAVA服务器页码库,它的特性对设计者和程序员都适用。它允许非程序员在很容易的在一个数据库里的数据里工作而不需要编程技能。它也允许程序员开发和改编数据库而不需要改变数据显示编码.zip-Appenture JSP Database Custom Tag Library is a powerful Java Server Page Library, which features the designers and programmers are applicable. It allows non-programmers can easily in a database data without the need to work programming skills. It also allows programmers to the development and adaptation of the database data without the need to change code. Zip
Platform: | Size: 59392 | Author: 邓耀 | Hits:

[Windows Developbindows1.3评估版

Description: Bindows是一个用来创建功能强大的瘦客户端应用程序的框架。Bindows应用程序运行于现代的Web浏览器中。在其中,它们使用DHTML来呈现丰富的可以包含很多不同窗体小部件(widget)的图形用户界面(GUI)。Bindows应用程序可以使用很多方法与服务器端进行交互。其中大多数方法是基于XML的。它同样支持XML-RPC和基于SOAP的Web Services。程序设计语言是JavaScript。 所有windows控件的模拟。按钮,标签,列表,文本框,对话框,颜色,样式,等等,一个典型桌面应用应该有的控件、样式都具备 新版本1.30beta,增加了千呼万唤的Theme支持。Erik&Emil不愧为世界水平的JavaScript高手,原本仅用做浏览器脚本支持的这个小东西如今被发挥得淋漓尽致,几乎到了浏览器JavaScript所能表现的最高境界-Bindows is a framework for B/S application. it works with morden exploer, use DHTML show GUI composed by lots of different widgets, and support many way to communicate with server base on XML,support XML-RPC and web services(SOAP).Bindows developed use JavaScript, implement all widgets in windows( button,tag,list-box, editbox,dialog) with same color and style etc. which enough to build a classic desktop application. New 1.30beta version, add Theme support. Erik&Emil is famous JavaSrcipt developer,they let the JavaScript on the top of the world.
Platform: | Size: 1219584 | Author: shen | Hits:

[ERP-EIP-OA-Portalef_src_0[1].90

Description: 这是一个轻便的j2ee的web应用框架,是一个在多个项目中运用的实际框架,采用struts,hebinate,xml等技术,有丰富的tag,role,navigation,session,dictionary等功能.-This is a light creates a web application framework, a number of projects in the practical application of the framework, using struts, hebinate, xml technology with rich tag, role, navigation, session, the dictionary functions.
Platform: | Size: 1055744 | Author: 刘一 | Hits:

[JSPjspbook3

Description: JSP设计(第三版)一书源代码 JSP设计(第三版)》得到了充分的修订和更新,以涵盖JSP 2.0和JSTL 1.1规范。其中详细介绍了JSP 2.0中新增的表达式语言(EL)、JSTL 1.1标记库和新的函数库、支持定制标记库开发(而无须Java代码)的新标记文件格式、简化的Java标记库API、在JSP XML语法方面所做出的改进等等。不仅如此,在此还详尽地描述了Apache Tomcat服务器的建立、JSP和JSTL语法和特征、错误处理和调试、鉴别和个人化、数据库访问、XML处理和国际化等诸多内容。-JSP design (third edition) a book JSP source code design (third edition) "have been fully revised and updated to cover Script JSP 2.0 and 1.1 specifications. A detailed account of JSP 2.0 new expression language (EL), Script 1.1 and a new marker for the library to support the development of Custom Tag Library (without Java code), the new labeling format, simplified Java API library markings, the syntax of XML in JSP by making improvements and so on. Not only that, this is also a detailed description of the Apache Tomcat server establishment, and Script JSP syntax and features, debugging and error handling, identification and personal, database access, XML processing and many other international content.
Platform: | Size: 3379200 | Author: 罗冬 | Hits:

[Applicationsafuer

Description: 本平台是以web2.0为基本的标准,以spring+hibernate为基本的程序构架,以各种AJAX+tag为基本的view层表现形式构成。-the platform to build what is the basic standard to spring pertains to the basic framework of the procedure in various AJAX tag for the basic view manifestations constitutes layer.
Platform: | Size: 12325888 | Author: 付京周 | Hits:

[JSPJSP.Tag.Defined.Reference

Description: 掌握自定义JSP标签,是一本很好的java web开发的参考书籍。-master custom JSP label is a good java web development reference books.
Platform: | Size: 169984 | Author: 小科 | Hits:

[xml-soap-webserviceOReilly.JavaScript.The.Definitive.Guide.5th.Editio

Description: This Fifth Edition is completely revised and expanded to cover JavaScript as it is used in today s Web 2.0 applications. This book is both an example-driven programmer s guide and a keep-on-your-desk reference, with new chapters that explain everything you need to know to get the most out of JavaScript, including: Scripted HTTP and Ajax XML processing Client-side graphics using the <canvas> tag Namespaces in JavaScript--essential when writing complex programs Classes, closures, persistence, Flash, and JavaScript embedded in Java applications-This Fifth Edition is a completely revised nd expanded to cover JavaScript as it is used in t oday's Web 2.0 applications. This book is both an example- driven programmer's guide and a keep-o n-your-desk reference, with new chapters that explain everything you n eed to know to get the most out of JavaScript, including : Scripted HTTP and XML processing Ajax Client-s ide graphics using the
Platform: | Size: 2291712 | Author: | Hits:

[JSP/JavaSubjectSpider_ByKelvenJU

Description: 1、锁定某个主题抓取; 2、能够产生日志文本文件,格式为:时间戳(timestamp)、URL; 3、抓取某一URL时最多允许建立2个连接(注意:本地作网页解析的线程数则不限) 4、遵守文明蜘蛛规则:必须分析robots.txt文件和meta tag有无限制;一个线程抓完一个网页后要sleep 2秒钟; 5、能对HTML网页进行解析,提取出链接URL,能判别提取的URL是否已处理过,不重复解析已crawl过的网页; 6、能够对spider/crawler程序的一些基本参数进行设置,包括:抓取深度(depth)、种子URL等; 7、使用User-agent向服务器表明自己的身份; 8、产生抓取统计信息:包括抓取速度、抓取完成所需时间、抓取网页总数;重要变量和所有类、方法加注释; 9、请遵守编程规范,如类、方法、文件等的命名规范, 10、可选:GUI图形用户界面、web界面,通过界面管理spider/crawler,包括启停、URL增删等 -1, the ability to lock a particular theme crawls; 2, can produce log text file format : timestamp (timestamp), the URL; 3. crawls up a URL to allow for the establishment of two connecting (Note : local website for a few analytical thread is not limited) 4, abide by the rules of civilized spiders : to be analyzed robots.txt file and meta tag unrestricted; End grasp a thread after a website to sleep two seconds; 5, capable of HTML pages for analysis, Links to extract URL, the extract can judge whether the URL have been processed. Analysis has not repeat crawl over the web; 6. to the spider/crawler some of the basic procedures for setting up parameters, including : Grasp depth (depth), seeds URL; 7. use User-agent to the server to identify themselves; 8, crawls produce statistical informati
Platform: | Size: 1911808 | Author: | Hits:

[JSPjsfbqku

Description: java server face 标签库大全中文版-java server face Encyclopedia of the Chinese version of tag library
Platform: | Size: 232448 | Author: pidn | Hits:

[JSP/JavasplitPage

Description: Jsp分页标签,非常方便 Jsp分页标签,非常方便-Jsp page tags, a very convenient paging jsp tag, a very convenient paging jsp tag, a very convenient
Platform: | Size: 3072 | Author: | Hits:

[JSPOreilly.Java.Server.Pages.3rd.Edition.eBook-LiB.ra

Description: JavaServer Pages, Third Edition is completely revised and updated to cover the substantial changes in the 2.0 version of the JSP specification. It also includes detailed coverage of the major revisions to the JSP Standard Tag Library (JSTL) specification. Combining plenty of practical advice with detailed coverage of JSP syntax and features and clear, useful examples, JavaServer Pages, Third Edition demonstrates how to embed server-side Java into Web pages, while also covering important topics such as JavaBeans, Enterprise JavaBeans (EJB), and JDBC database access. -JavaServer Pages, Third Edition is completely revised and updated to cover the substantial changes in the 2.0 version of the JSP specification. It also includes detailed coverage of the major revisions to the JSP Standard Tag Library (JSTL) specification. Combining plenty of practical advice with detailed coverage of JSP syntax and features and clear, useful examples, JavaServer Pages, Third Edition demonstrates how to embed server-side Java into Web pages, while also covering important topics such as JavaBeans, Enterprise JavaBeans (EJB), and JDBC database access.
Platform: | Size: 2199552 | Author: 杨阳 | Hits:

[JSP/Javach12

Description: 1.本目录存放了演示自定义标签开发与使用的web应用程序,可以直接部署到应用服务器并运行。 2.shopping存放了电子商店程序,该程序的header.jsp使用自定义标签显示当前系统日期。 3.tag存放演示传统标签开发与使用的例程序。 4.simple存放演示简单标签开发与使用的例程序。由于WebLogic Server8.1不支持JSP2.0,所以simple程序不能在WebLogic Server8.1下运行。-1. Catalog store demonstration custom tag development and use of web applications, can be directly deployed to the application server and run. 2.shopping store electronic store procedures, the procedures header.jsp use a custom tag displays the current system date. Demonstration of traditional 3.tag stored Tab Example development and use procedures. 4.simple store development and demonstration of a simple tag Example procedures used. Because WebLogic Server8.1 does not support JSP2.0, so simple procedures should not run in the WebLogic Server8.1.
Platform: | Size: 395264 | Author: lailijuan | Hits:

[JSP/Javaj-jspdwj

Description: JSP 网页控制图片大小 。 通过自定义jsp tag 实现。-JSP page to control image size
Platform: | Size: 252928 | Author: fig | Hits:

[Other Web CodeTag

Description: java分页自定义标签,方便好用,能够正常运行-java tag
Platform: | Size: 621568 | Author: 天才 | Hits:

[JSP/JavaJSP-tag

Description: 比较常用的java标签库,涵盖常用的jsp标签,方便新手查阅,很实用-More common java tag library, covering common jsp tag, easy access to the novice, it is practical
Platform: | Size: 20480 | Author: hello | Hits:

[Otherjava自定义标签

Description: 详细讲解java的自定义标签用法,适用于SpringMVC的开发进阶。(Explain the custom tag usage of Java in detail)
Platform: | Size: 206848 | Author: skylinethj | Hits:
« 12 3 4 5 6 7 »

CodeBus www.codebus.net